Similarity Coefficients for Binary Chemoinformatics Data: Overview and Extended Comparison Using Simulated and Real Data Sets

نویسندگان

  • Roberto Todeschini
  • Viviana Consonni
  • Hua Xiang
  • John D. Holliday
  • Paolo Massimo Buscema
  • Peter Willett
چکیده

This paper reports an analysis and comparison of the use of 51 different similarity coefficients for computing the similarities between binary fingerprints for both simulated and real chemical data sets. Five pairs and a triplet of coefficients were found to yield identical similarity values, leading to the elimination of seven of the coefficients. The remaining 44 coefficients were then compared in two ways: by their theoretical characteristics using simple descriptive statistics, correlation analysis, multidimensional scaling, Hasse diagrams, and the recently described atemporal target diffusion model; and by their effectiveness for similarity-based virtual screening using MDDR, WOMBAT, and MUV data. The comparisons demonstrate the general utility of the well-known Tanimoto method but also suggest other coefficients that may be worthy of further attention.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identification of an Autonomous Underwater Vehicle Dynamic Using Extended Kalman Filter with ARMA Noise Model

In the procedure of designing an underwater vehicle or robot, its maneuverability and controllability must be simulated and tested, before the product is finalized for manufacturing. Since the hydrodynamic forces and moments highly affect the dynamic and maneuverability of the system, they must be estimated with a reasonable accuracy. In this study, hydrodynamic coefficients of an autonomous un...

متن کامل

A New Surface Tension Model for Prediction of Interaction Energy between Components and Activity Coefficients in Binary Systems

In this work, we develop a correlative model based on the surface tension data in order to calculate thermodynamic parameters, such as interaction energy between components (Uij), activity coefficients and etc. In the new approach, by using Li et al. (LWW) model, a three-parameter surface tension equation is derived for liquid mixtures. The surface tension data of 54 aqueous and 73 non-aqueous ...

متن کامل

On the use of Heronian means in a similarity classifier

This paper introduces new similarity classifiers using the Heronian mean, and the generalized Heronian mean operators. We examine the use of these operators at the aggregation step within the similarity classifier. The similarity classifier was earlier studied with other operators, in particular with an arithmetic mean, generalized mean, OWA operators, and many more. The two classifiers here ar...

متن کامل

On-Line Nonlinear Dynamic Data Reconciliation Using Extended Kalman Filtering: Application to a Distillation Column and a CSTR

Extended Kalman Filtering (EKF) is a nonlinear dynamic data reconciliation (NDDR) method. One of its main advantages is its suitability for on-line applications. This paper presents an on-line NDDR method using EKF. It is implemented for two case studies, temperature measurements of a distillation column and concentration measurements of a CSTR. In each time step, random numbers with zero m...

متن کامل

An Empirical Comparison of Distance Measures for Multivariate Time Series Clustering

Multivariate time series (MTS) data are ubiquitous in science and daily life, and how to measure their similarity is a core part of MTS analyzing process. Many of the research efforts in this context have focused on proposing novel similarity measures for the underlying data. However, with the countless techniques to estimate similarity between MTS, this field suffers from a lack of comparative...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of chemical information and modeling

دوره 52 11  شماره 

صفحات  -

تاریخ انتشار 2012